1 Introduction

Iterative Linear Association Analysis (ILAA) is a computational method that creates a linear transformation from a sample of multidimensional data that effectively removes linear associations between data variables. The returned transformation matrix can be used to:

  1. Do an exploratory analysis of latent variables and their association to the observed variables

  2. Do exploratory discovery of latent variables associated with an specific outcome-target

  3. Addressing multicollinearity issues in linear regression models

    1. Better estimation and interpretation of model variables

    2. Improve linear model performance

  4. Simplify the multidimensional search space for many ML algorithms

The objective of this tutorial is to guide users in using the ILAA to effectively accomplish the aforementioned tasks. The tutorial will showcase:

1.1 The Libraries

ILAA is a wrapper of the more general method of data decorrelation algorithm (IDeA) implemented in R, and both are part of the FRESA.CAD 3.4.6 package.

## From git hub
#First install package devtools
#library(devtools)
#install_github("joseTamezPena/FRESA.CAD")

## For ILAA
library("FRESA.CAD")

## For network analysis
library(igraph)

## For multicollinearity
library(multiColl)
library(car)

2 Test Data

For this tutorial I’ll use the body-fat prediction data set. The data was downloaded from Kaggle:

https://www.kaggle.com/datasets/fedesoriano/body-fat-prediction-dataset

The Kaggle data disclaimer:

“Source The data were generously supplied by Dr. A. Garth Fisher who gave permission to freely distribute the data and use for non-commercial purposes.

Roger W. Johnson Department of Mathematics & Computer Science South Dakota School of Mines & Technology 501 East St. Joseph Street Rapid City, SD 57701

email address: web address: http://silver.sdsmt.edu/~rwjohnso

2.1 Loading the Data

The following code snippet loads the data and removes the density information from the data. It also computes the Body Mass Index (BMI)

body_fat <- read.csv("~/GitHub/LatentBiomarkers/Data/BodyFat/BodyFat.csv", header=TRUE)

### Removing density as estimator
body_fat$Density <- NULL

body_fat$BMI <- 10000*body_fat$Weight*0.453592/((body_fat$Height*2.54)^2)
## Removing subjects with data errors
body_fat <- body_fat[body_fat$BMI<=50,]

3 ILAA Unsupervised Processing

The ILAA function is:

 decorrelatedData <- ILAA(data=NULL,
                          thr=0.80,
                          method=c("pearson","spearman"),
                          Outcome=NULL,
                          drivingFeatures=NULL,
                          maxLoops=100,
                          verbose=FALSE,
                          bootstrap=0
                          )

where:

3.1 ILLA Auxiliary Functions

To help user taking advantage of the ILLA transformed object. FRESA.CAD provide the following auxiliary functions:

newTransformedData   <- predictDecorrelate(decorrelatedData,NewData)
theBetaCoefficientts <- getLatentCoefficients(decorrelatedData)
fromLatenttoObserved <- getObservedCoef(decorrelatedData,latentModel)
  • predictDecorrelate() Rotates any new data set based on the output of an ILAA transformed data set.

  • getLatentCoefficients() Returns a list of all the beta coefficients for each one of the discovered latent variables.

  • getObservedCoef() returns the beta coefficients on the observed space of any linear model that was trained on the UPLTM space.

3.2 Sample Usage

By default, the ILAA function will target a correlation lower than 0.8 using the Pearson correlation measure. But user has the freedom to chose between robust fitting with Spearman correlation measure, and/or set the level of feature association by lowering the threshold. The following snippet shows the different options.


# Default call
body_fat_Decorrelated <- ILAA(body_fat)
pander::pander(attr(body_fat_Decorrelated,"VarRatio"))
Table continues below
BodyFat Age Ankle Forearm Wrist Weight La_Biceps La_Neck
1 1 1 1 1 1 0.359 0.303
La_Knee La_Thigh La_BMI La_Chest La_Abdomen La_Hip La_Height
0.272 0.242 0.211 0.171 0.15 0.11 0.0209

# Explore the convergence metrics in verbose mode
body_fat_Decorrelated <- ILAA(body_fat,verbose=TRUE)

fast | LM | Weight BodyFat Age Weight Height Neck Chest 0.40000000 0.06666667 1.00000000 0.13333333 0.53333333 0.73333333

Included: 15 , Uni p: 0.01 , Base Size: 1 , Rcrit: 0.1467743

1 <R=0.944,thr=0.900>, Top: 2< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.888

2 <R=0.888,thr=0.800>, Top: 1< 5 >Fa= 2,<|><>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.860

3 <R=0.860,thr=0.800>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.959

4 <R=0.959,thr=0.950>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.735

5 <R=0.735,thr=0.800>

[ 5 ], 0.4782625 Decor Dimension: 10 Nused: 10 . Cor to Base: 7 , ABase: 15 , Outcome Base: 0

pander::pander(attr(body_fat_Decorrelated,"VarRatio"))
Table continues below
BodyFat Age Ankle Forearm Wrist Weight La_Biceps La_Neck
1 1 1 1 1 1 0.359 0.303
La_Knee La_Thigh La_BMI La_Chest La_Abdomen La_Hip La_Height
0.272 0.242 0.211 0.171 0.15 0.11 0.0209

# Robust Linear Fitting with the Spearman correlation measure
body_fat_Decorrelated <- ILAA(body_fat,method="spearman",verbose=TRUE)

spearman | RLM | Weight BodyFat Age Weight Height Neck Chest 0.46666667 0.06666667 1.00000000 0.13333333 0.53333333 0.80000000

Included: 15 , Uni p: 0.01 , Base Size: 1 , Rcrit: 0.1467743

1 <R=0.929,thr=0.900>, Top: 2< 1 >Fa= 2,<><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.872

2 <R=0.872,thr=0.800>, Top: 1< 4 >Fa= 2,<><>Tot Used: 8 , Added: 4 , Zero Std: 0 , Max Cor: 0.837

3 <R=0.837,thr=0.800>, Top: 1< 1 >Fa= 2,<><>Tot Used: 9 , Added: 1 , Zero Std: 0 , Max Cor: 0.990

4 <R=0.990,thr=0.950>, Top: 1< 1 >Fa= 2,<><>Tot Used: 9 , Added: 1 , Zero Std: 0 , Max Cor: 0.781

5 <R=0.781,thr=0.800>

[ 5 ], 0.4849516 Decor Dimension: 9 Nused: 9 . Cor to Base: 6 , ABase: 15 , Outcome Base: 0

pander::pander(attr(body_fat_Decorrelated,"VarRatio"))
Table continues below
BodyFat Age Ankle Biceps Forearm Wrist Weight La_Neck La_Knee
1 1 1 1 1 1 1 0.303 0.273
La_Thigh La_BMI La_Chest La_Abdomen La_Hip La_Height
0.242 0.212 0.174 0.15 0.11 0.0221

# Lowering the threshold
body_fat_Decorrelated <- ILAA(body_fat,thr=0.4,verbose=TRUE)

fast | LM | Weight BodyFat Age Weight Height Neck Chest 0.40000000 0.06666667 1.00000000 0.13333333 0.53333333 0.73333333

Included: 15 , Uni p: 0.01 , Base Size: 1 , Rcrit: 0.1467743

1 <R=0.944,thr=0.900>, Top: 2< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.888

2 <R=0.888,thr=0.800>, Top: 1< 5 >Fa= 2,<|><>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.860

3 <R=0.860,thr=0.800>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.959

4 <R=0.959,thr=0.950>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.735

5 <R=0.735,thr=0.700>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 11 , Added: 1 , Zero Std: 0 , Max Cor: 0.631

6 <R=0.631,thr=0.600>, Top: 1< 3 >Fa= 2,<|><>Tot Used: 14 , Added: 3 , Zero Std: 0 , Max Cor: 0.501

7 <R=0.501,thr=0.500>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.584

8 <R=0.584,thr=0.500>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.478

9 <R=0.478,thr=0.400>, Top: 2< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.421

10 <R=0.421,thr=0.400>, Top: 1< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.375

11 <R=0.375,thr=0.400>

[ 11 ], 0.3726062 Decor Dimension: 14 Nused: 14 . Cor to Base: 12 , ABase: 15 , Outcome Base: 0

pander::pander(attr(body_fat_Decorrelated,"VarRatio"))
Table continues below
Age Weight La_Ankle La_Forearm La_Wrist La_Biceps La_BodyFat
1 1 0.624 0.602 0.459 0.359 0.309
La_Neck La_Knee La_BMI La_Thigh La_Abdomen La_Hip La_Chest La_Height
0.303 0.272 0.211 0.194 0.15 0.11 0.108 0.0209

# Tring to achive the maximum independence beteeen variables, i.e., thr=0.0
body_fat_Decorrelated <- ILAA(body_fat,thr=0.0,verbose=TRUE)

fast | LM | Weight BodyFat Age Weight Height Neck Chest 0.40000000 0.06666667 1.00000000 0.13333333 0.53333333 0.73333333

Included: 15 , Uni p: 0.01 , Base Size: 1 , Rcrit: 0.1467743

1 <R=0.944,thr=0.900>, Top: 2< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.888

2 <R=0.888,thr=0.800>, Top: 1< 5 >Fa= 2,<|><>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.860

3 <R=0.860,thr=0.800>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.959

4 <R=0.959,thr=0.950>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.735

5 <R=0.735,thr=0.700>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 11 , Added: 1 , Zero Std: 0 , Max Cor: 0.631

6 <R=0.631,thr=0.600>, Top: 1< 3 >Fa= 2,<|><>Tot Used: 14 , Added: 3 , Zero Std: 0 , Max Cor: 0.501

7 <R=0.501,thr=0.500>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.584

8 <R=0.584,thr=0.500>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.478

9 <R=0.478,thr=0.400>, Top: 2< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.421

10 <R=0.421,thr=0.400>, Top: 1< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.375

11 <R=0.375,thr=0.300>, Top: 4< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.384

12 <R=0.384,thr=0.300>, Top: 2< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.343

13 <R=0.343,thr=0.300>, Top: 1< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.335

14 <R=0.335,thr=0.300>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.291

15 <R=0.291,thr=0.200>, Top: 4< 2 >Fa= 8,<|><>Tot Used: 15 , Added: 7 , Zero Std: 0 , Max Cor: 0.252

16 <R=0.252,thr=0.200>, Top: 3< 3 >Fa= 8,<|><>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.207

17 <R=0.207,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.217

18 <R=0.217,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.221

19 <R=0.221,thr=0.200>, Top: 1< 1 >Fa= 9,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.189

20 <R=0.189,thr=0.147>, Top: 5< 2 >Fa= 9,<|><>Tot Used: 15 , Added: 4 , Zero Std: 0 , Max Cor: 0.157

21 <R=0.157,thr=0.147>, Top: 5< 1 >Fa= 9,<><>Tot Used: 15 , Added: 0 , Zero Std: 0 , Max Cor: 0.157

[ 21 ], 0.1572866 Decor Dimension: 15 Nused: 15 . Cor to Base: 14 , ABase: 15 , Outcome Base: 0

pander::pander(attr(body_fat_Decorrelated,"VarRatio"))
Table continues below
Weight La_Ankle La_Forearm La_Age La_Wrist La_Biceps La_BodyFat
1 0.54 0.518 0.483 0.403 0.32 0.29
La_Neck La_Knee La_BMI La_Thigh La_Abdomen La_Chest La_Hip La_Height
0.269 0.228 0.211 0.183 0.128 0.101 0.0993 0.0209

For the rest of the tutorial I’ll set the correlation goal to 0.2 in verbose mode.


# Calling ILAA to achieve a final correlation of 0.2
body_fat_Decorrelated <- ILAA(body_fat,thr=0.2,verbose=TRUE)

fast | LM | Weight BodyFat Age Weight Height Neck Chest 0.40000000 0.06666667 1.00000000 0.13333333 0.53333333 0.73333333

Included: 15 , Uni p: 0.01 , Base Size: 1 , Rcrit: 0.1467743

1 <R=0.944,thr=0.900>, Top: 2< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.888

2 <R=0.888,thr=0.800>, Top: 1< 5 >Fa= 2,<|><>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.860

3 <R=0.860,thr=0.800>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.959

4 <R=0.959,thr=0.950>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.735

5 <R=0.735,thr=0.700>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 11 , Added: 1 , Zero Std: 0 , Max Cor: 0.631

6 <R=0.631,thr=0.600>, Top: 1< 3 >Fa= 2,<|><>Tot Used: 14 , Added: 3 , Zero Std: 0 , Max Cor: 0.501

7 <R=0.501,thr=0.500>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.584

8 <R=0.584,thr=0.500>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.478

9 <R=0.478,thr=0.400>, Top: 2< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.421

10 <R=0.421,thr=0.400>, Top: 1< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.375

11 <R=0.375,thr=0.300>, Top: 4< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.384

12 <R=0.384,thr=0.300>, Top: 2< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.343

13 <R=0.343,thr=0.300>, Top: 1< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.335

14 <R=0.335,thr=0.300>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.291

15 <R=0.291,thr=0.200>, Top: 4< 2 >Fa= 8,<|><>Tot Used: 15 , Added: 7 , Zero Std: 0 , Max Cor: 0.252

16 <R=0.252,thr=0.200>, Top: 3< 3 >Fa= 8,<|><>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.207

17 <R=0.207,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.217

18 <R=0.217,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.221

19 <R=0.221,thr=0.200>, Top: 1< 1 >Fa= 9,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.189

20 <R=0.189,thr=0.200>

[ 20 ], 0.188603 Decor Dimension: 15 Nused: 15 . Cor to Base: 14 , ABase: 15 , Outcome Base: 0

pander::pander(attr(body_fat_Decorrelated,"VarRatio"))
Table continues below
Weight La_Ankle La_Forearm La_Age La_Wrist La_Biceps La_BodyFat
1 0.555 0.518 0.483 0.403 0.32 0.29
La_Neck La_Knee La_BMI La_Thigh La_Abdomen La_Chest La_Hip La_Height
0.279 0.228 0.211 0.183 0.132 0.104 0.0993 0.0209

3.3 Data Frame Attributes

The returned data matrix contains the following attributes

  attr(body_fat_Decorrelated,"UPLTM")            #The transformation matrix
  attr(body_fat_Decorrelated,"fscore")           #The score of each feature
  attr(body_fat_Decorrelated,"drivingFeatures")  #The list of driving features
  attr(body_fat_Decorrelated,"unaltered")        #The list of unaltered features
  attr(body_fat_Decorrelated,"LatentVariables")  #The list of latent variables
  attr(body_fat_Decorrelated,"R.critical")       #The estimated minimum achieviable correlation
  attr(body_fat_Decorrelated,"IDeAEvolution")    #Evolution of the algorithm
  attr(body_fat_Decorrelated,"VarRatio")         #Variance Ratios: var(Latent)/Var(obs)

The main attributes is “UPLTM”. That stores the specific linear transformation matrix from observed variables to the latent variable.

The next relevant attribute is the “VarRatio", this attributive stores the fraction of the original feature variance that is still present in the latent variable. All non-altered variables return a”VarRatio” of 1.

The “IDeAEvolution” attribute can be used to verify if the algorithm achieved the target correlation goal, and the sparsity of the returned matrix.

3.4 Plotting the Evolution

Here we will use the attr(dataTransformed,"IDeAEvolution") to plot the evolution of the correlation measure and the evolution of the matrix sparsity.

par(mfrow=c(1,2),cex=0.5)

# Correlation
yval <- attr(body_fat_Decorrelated,"IDeAEvolution")$Corr
xidx <- c(1:length(yval))
plot(xidx,yval,
     xlab="Iteration Cycle",
     ylab="Max. Pearson Correlation",
     ylim=c(0,1.0),
     main="Evolution of the maximum Correlation")
  lfit <-try(loess(yval~xidx,span=0.5));
  if (!inherits(lfit,"try-error"))
  {
    plx <- try(predict(lfit,se=TRUE))
    if (!inherits(plx,"try-error"))
    {
      lines(xidx,plx$fit,lty=1,col="red")
    }
  }

# Sparsity  
yval <- attr(body_fat_Decorrelated,"IDeAEvolution")$Spar

plot(xidx,yval,
     xlab="Iteration Cycle",
     ylab="Matrix Sparcity",
     ylim=c(0,1.0),
     main="Evolution of the Matrix Sparcity")
  lfit <-try(loess(yval~xidx,span=0.5));
  if (!inherits(lfit,"try-error"))
  {
    plx <- try(predict(lfit,se=TRUE))
    if (!inherits(plx,"try-error"))
    {
      lines(xidx,plx$fit,lty=1,col="red")
    }
  }

3.5 The ILAA Transformed Data

Before exploring into more detail, the properties of the ILAA results. Let us first verify that the returned matrix does not contain features with very high correlation among them.

Here I’ll plot the original correlation and the correlation of the returned data set.


# The original
  par(cex=0.6,cex.main=0.85,cex.axis=0.7)
  cormat <- cor(body_fat,method="pearson")
  gplots::heatmap.2(abs(cormat),
                    trace = "none",
                    mar = c(5,5),
                    col=rev(heat.colors(11)),
                    main = "Original Correlation",
                    cexRow = 0.75,
                    cexCol = 0.75,
                     srtCol=30,
                     srtRow=60,
                    key.title=NA,
                    key.xlab="|Pearson Correlation|",
                    xlab="Feature", ylab="Feature")


# The transformed
  cormat <- cor(body_fat_Decorrelated,method="pearson")
  gplots::heatmap.2(abs(cormat),
                    trace = "none",
                    mar = c(5,5),
                    col=rev(heat.colors(11)),
                    main = "Correlation After ILAA",
                    cexRow = 0.75,
                    cexCol = 0.75,
                     srtCol=30,
                     srtRow=60,
                    key.title=NA,
                    key.xlab="|Pearson Correlation|",
                    xlab="Feature", ylab="Feature")

3.6 Exploring the Transformation

The attr(body_fat_Decorrelated,"UPLTM") returns the transformation matrix. The UPLTM is sparse, here I show a heat map of the transformation matrix that shows which elements are different from zero.


  UPLTM <- attr(body_fat_Decorrelated,"UPLTM")
  
  gplots::heatmap.2(1.0*(abs(UPLTM)>0),
                    trace = "none",
                    mar = c(5,5),
                    col=rev(heat.colors(2)),
                    Rowv=NULL,
                    Colv="Rowv",
                    dendrogram="none",
                    main = "Transformation matrix",
                    cexRow = 0.75,
                    cexCol = 0.75,
                   srtCol=30,
                   srtRow=60,
                    key.title=NA,
                    key.xlab="|Beta|>0",
                    xlab="Output Feature", ylab="Input Feature")

3.7 The Latent Formulas

The sparsity of the UPLTM matrix can be analyzed to get the formula for each one of the latent formulas. The getLatentCoefficients() and its attribute: attr(LatentFormulas,"LatentCharFormulas") can be used to display the formula of the latent variables.

# Get a list with the latent formulas' coefficients
LatentFormulas <- getLatentCoefficients(body_fat_Decorrelated)

# A string character with the formulas can be obtained by:
charFormulas <- attr(LatentFormulas,"LatentCharFormulas")
pander::pander(as.matrix(charFormulas))
La_BodyFat + BodyFat + (0.120)Weight - (0.800)Abdomen - (0.480)BMI
La_Age + Age + (0.363)Weight - (0.636)Neck - (1.117)Abdomen - (8.09e-04)Hip + (2.273)Thigh - (1.732)Knee - (5.032)Wrist - (0.864)BMI
La_Height - (0.191)Weight + Height + (1.339)BMI
La_Neck - (0.100)Weight + Neck + (0.172)Hip - (0.074)BMI
La_Chest - (0.140)Weight + Chest - (0.363)Abdomen + (0.419)Hip + (0.265)Thigh - (1.082)BMI
La_Abdomen - (0.094)Weight + Abdomen - (1.865)BMI
La_Hip - (0.181)Weight + Hip - (0.430)BMI
La_Thigh - (0.056)Weight + (0.137)Abdomen - (0.489)Hip + Thigh - (0.256)BMI
La_Knee - (0.056)Weight + (0.067)Neck - (0.017)Abdomen - (0.046)Hip - (0.121)Thigh + Knee - (0.406)Wrist + (0.229)BMI
La_Ankle - (0.035)Weight + (0.098)Neck + (0.069)Abdomen + Ankle - (0.594)Wrist - (0.128)BMI
La_Biceps - (0.081)Weight + (0.075)Abdomen + (0.098)Hip - (0.200)Thigh + Biceps - (0.140)BMI
La_Forearm - (0.017)Weight - (0.323)Biceps + Forearm
La_Wrist - (0.012)Weight - (0.165)Neck + Wrist
La_BMI - (0.111)Weight + BMI

3.8 Latent Variable Interpretation

The ILAA returns the Unit Preserving Linear Transformation Matrix (UPLTM). This specific transformation is the combination of statistically significant linear association analysis between feature pairs. Each significant association is modeled by a linear equation; henceforth, the interpretation of each feature is as follows:

  • Each discovered latent variable is the residual of the observed parent variable vs. the suitable model of the variables associated with the parent variable. For example: \[ LaWrist= Wrist - 0.012Weight - 0.165Neck. \]

    Describes that the \(Wrist\) is associated with the \(Weight\) and \(Neck\). The latent variable \(LaWrist\) is the amount of information in the \(Wrist\) not found by \(Weight\) nor the \(Neck\).

  • The model of the \(Wrist\) is therefore:

\[ Wrist = +0.012Weight + 0.165Neck. \]

3.9 The Formula Network

The graph_from_adjacency_matrix() function from igraph can be used to visualize the association between variables.

par(op)

transform <- attr(body_fat_Decorrelated,"UPLTM") != 0
colnames(transform) <- str_remove_all(colnames(transform),"La_")
transform <- abs(transform*cor(body_fat[,rownames(transform)])) # The weights are proportional to the observed correlation


VertexSize <- attr(body_fat_Decorrelated,"fscore") # The size depends on the variable independence relevance (fscore)
names(VertexSize) <- str_remove_all(names(VertexSize),"La_")
VertexSize <- 10*(VertexSize-min(VertexSize))/(max(VertexSize)-min(VertexSize)) # Normalization


gr <- graph_from_adjacency_matrix(transform,mode = "directed",diag = FALSE,weighted=TRUE)
gr$layout <- layout_with_fr

fc <- cluster_optimal(gr)
plot(fc, gr,
     edge.width=2*E(gr)$weight,
     edge.arrow.size=0.5,
     edge.arrow.width=0.5,
     vertex.size=VertexSize,
     vertex.label.cex=0.85,
     vertex.label.dist=2,
     main="Feature Association")

par(op)

3.9.1 ILAA Solution is Data Dependent

I’ll generate 100 solutions of the UPLTM and aggregate the non-zero coefficients. Then, I’ll plot the heat map of the frequency of hits

par(op)

dsize <- nrow(body_fat);
taccmatrix <- cor(body_fat)*0;
for (lp in c(1:100))
{
  dmat <- ILAA(body_fat[sample(dsize,dsize,replace = TRUE),],thr=0.2)
  transform <- attr(dmat,"UPLTM") != 0 
  colnames(transform) <- str_remove_all(colnames(transform),"La_")
  taccmatrix[,colnames(transform)] <- taccmatrix[,colnames(transform)] + transform
}

gplots::heatmap.2(taccmatrix,
                    trace = "none",
                    mar = c(5,5),
                    Rowv=NULL,
                    Colv="Rowv",
                    dendrogram="none",
                    col=rev(heat.colors(11)),
                    main = "Transform Hits",
                    cexRow = 0.75,
                    cexCol = 0.75,
                   srtCol=30,
                   srtRow=60,
                    key.title=NA,
                    key.xlab="|Beta|>0",
                    xlab="Output Feature", ylab="Input Feature")

  par(op)

3.9.2 Bootstrapping ILLA

To handle data sensitivity, ILAA allows for bootstrapping estimation of the transformation matrix.


body_fat_Decorrelated <-  ILAA(body_fat,thr=0.2,verbose=TRUE,bootstrap=100)

fast | LM | Weight BodyFat Age Weight Height Neck Chest 0.40000000 0.06666667 1.00000000 0.13333333 0.53333333 0.73333333

Included: 15 , Uni p: 0.01 , Base Size: 1 , Rcrit: 0.1467743

1 <R=0.944,thr=0.900>, Top: 2< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.888

2 <R=0.888,thr=0.800>, Top: 1< 5 >Fa= 2,<|><>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.860

3 <R=0.860,thr=0.800>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.959

4 <R=0.959,thr=0.950>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.735

5 <R=0.735,thr=0.700>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 11 , Added: 1 , Zero Std: 0 , Max Cor: 0.631

6 <R=0.631,thr=0.600>, Top: 1< 3 >Fa= 2,<|><>Tot Used: 14 , Added: 3 , Zero Std: 0 , Max Cor: 0.501

7 <R=0.501,thr=0.500>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.584

8 <R=0.584,thr=0.500>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.478

9 <R=0.478,thr=0.400>, Top: 2< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.421

10 <R=0.421,thr=0.400>, Top: 1< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.375

11 <R=0.375,thr=0.300>, Top: 4< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.384

12 <R=0.384,thr=0.300>, Top: 2< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.343

13 <R=0.343,thr=0.300>, Top: 1< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.335

14 <R=0.335,thr=0.300>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.291

15 <R=0.291,thr=0.200>, Top: 4< 2 >Fa= 8,<|><>Tot Used: 15 , Added: 7 , Zero Std: 0 , Max Cor: 0.252

16 <R=0.252,thr=0.200>, Top: 3< 3 >Fa= 8,<|><>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.207

17 <R=0.207,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.217

18 <R=0.217,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.221

19 <R=0.221,thr=0.200>, Top: 1< 1 >Fa= 9,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.189

20 <R=0.189,thr=0.200>

[ 20 ], 0.188603 Decor Dimension: 15 Nused: 15 . Cor to Base: 14 , ABase: 15 , Outcome Base: 0

bootstrapping .(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.18,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.18,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00). (r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.18,w=1.00). (r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.18,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00)

Weight   La_Ankle La_Forearm     La_Age   La_Wrist  La_Biceps 

1.0000000 0.5546361 0.5164355 0.4836368 0.4016978 0.3195751



pander::pander(attr(body_fat_Decorrelated,"VarRatio"))
Table continues below
Weight La_Ankle La_Forearm La_Age La_Wrist La_Biceps La_BodyFat
1 0.555 0.516 0.484 0.402 0.32 0.289
La_Neck La_Knee La_BMI La_Thigh La_Abdomen La_Chest La_Hip La_Height
0.273 0.235 0.211 0.182 0.131 0.103 0.0993 0.0207

## Getting the formulas
LatentFormulas <- getLatentCoefficients(body_fat_Decorrelated)
charFormulas <- attr(LatentFormulas,"LatentCharFormulas")
pander::pander(as.matrix(charFormulas))
La_BodyFat + BodyFat + (0.121)Weight - (2.50e-03)Neck - (0.805)Abdomen + (0.015)Wrist - (0.470)BMI
La_Age - (1.27e-03)BodyFat + Age + (0.316)Weight + (0.048)Height - (0.366)Neck + (1.88e-03)Chest - (1.155)Abdomen - (0.052)Hip + (2.120)Thigh - (1.074)Knee - (0.031)Biceps + (0.027)Forearm - (5.492)Wrist - (0.493)BMI
La_Height - (1.73e-04)BodyFat - (0.191)Weight + Height - (5.44e-05)Neck - (9.18e-04)Chest + (4.20e-05)Abdomen - (5.38e-04)Hip - (4.10e-05)Thigh + (1.05e-03)Biceps - (3.04e-03)Forearm + (3.59e-04)Wrist + (1.342)BMI
La_Neck - (0.096)Weight + Neck + (0.170)Hip - (0.020)Wrist - (0.106)BMI
La_Chest - (0.140)Weight + Chest - (0.372)Abdomen + (0.450)Hip + (0.190)Thigh - (1.30e-03)Biceps - (1.026)BMI
La_Abdomen - (0.098)Weight + Abdomen - (1.869)BMI
La_Hip - (0.181)Weight + Hip - (0.431)BMI
La_Thigh - (0.053)Weight - (1.12e-03)Neck + (1.60e-03)Chest + (0.123)Abdomen - (0.490)Hip + Thigh - (0.015)Biceps + (7.22e-03)Wrist - (0.233)BMI
La_Knee - (0.066)Weight + (0.037)Neck - (2.15e-04)Chest - (0.014)Abdomen + (0.015)Hip - (0.110)Thigh + Knee + (1.51e-03)Biceps - (0.236)Wrist + (0.153)BMI
La_Ankle - (1.51e-04)BodyFat - (0.032)Weight + (0.095)Neck + (0.051)Abdomen - (2.64e-04)Hip + (4.80e-04)Thigh - (0.013)Knee + Ankle - (0.581)Wrist - (0.095)BMI
La_Biceps - (0.077)Weight - (0.016)Neck + (0.051)Abdomen + (0.096)Hip - (0.199)Thigh + Biceps - (0.094)BMI
La_Forearm - (0.019)Weight - (2.31e-03)Neck + (3.10e-03)Abdomen - (5.75e-03)Hip + (0.019)Thigh - (0.318)Biceps + Forearm - (0.015)Wrist - (7.20e-03)BMI
La_Wrist + (2.84e-04)BodyFat - (0.013)Weight - (0.161)Neck - (6.66e-05)Abdomen + (1.30e-03)Hip + (1.08e-03)Thigh + Wrist - (1.26e-03)BMI
La_BMI - (0.111)Weight + BMI


## The transformation
par(op)

transform <- attr(body_fat_Decorrelated,"UPLTM") != 0 # The non-zero coefficients
colnames(transform) <- str_remove_all(colnames(transform),"La_")  # For network analysis
transform <- abs(transform*cor(body_fat[,rownames(transform)])) # The weights are proportional to the observed correlation


gplots::heatmap.2(transform,
                    trace = "none",
                    mar = c(5,5),
                    Rowv=NULL,
                    Colv="Rowv",
                    dendrogram="none",
                    col=rev(heat.colors(11)),
                    main = "(Transform <> 0)*Correlation",
                    cexRow = 0.75,
                    cexCol = 0.75,
                   srtCol=30,
                   srtRow=60,
                    key.title=NA,
                    key.xlab="|R|",
                    xlab="Output Feature", ylab="Input Feature")

  par(op)



## Network analysis
# The vertex size will be proportional to the fscore of the IDeA procedure.
  
VertexSize <- attr(body_fat_Decorrelated,"fscore") # The size depends on the variable independence relevance (fscore)
VertexSize <- 10*(VertexSize-min(VertexSize))/(max(VertexSize)-min(VertexSize)) # Normalization



gr <- graph_from_adjacency_matrix(transform,mode = "directed",diag = FALSE,weighted=TRUE)
gr$layout <- layout_with_fr

fc <- cluster_optimal(gr)
plot(fc, gr,
     edge.width=2*E(gr)$weight,
     edge.arrow.size=0.5,
     edge.arrow.width=0.5,
     vertex.size=VertexSize,
     vertex.label.cex=0.85,
     vertex.label.dist=2,
     main="Bootstrap: Feature Association")

par(op)

## Here we plot the final degree of correlation among output features
  cormat <- cor(body_fat_Decorrelated,method="pearson")
  gplots::heatmap.2(abs(cormat),
                    trace = "none",
                    mar = c(5,5),
                    col=rev(heat.colors(11)),
                    main = "Correlation After ILAA",
                    cexRow = 0.75,
                    cexCol = 0.75,
                     srtCol=30,
                     srtRow=60,
                    key.title=NA,
                    key.xlab="|Pearson Correlation|",
                    xlab="Feature", ylab="Feature")


par(op)
diag(cormat) <- 0
pander::pander(max(abs(cormat)))

0.161

3.9.2.1 Association Plots

The following code shows the association of the latent variable to each one of the observed parent variable, and the association of the parent variables to its linear model. For this example I will use the "VarRatio" to rank the latent variables from the ones that keep the most original variance to the latent variable with the minimum fraction.




par(mfrow=c(1,2),cex=0.45)
fnames <- names(charFormulas)[1]
## Sort by explanined variace
varratio <- attr(body_fat_Decorrelated,"VarRatio")
varratio <- varratio[names(varratio) %in% names(charFormulas)]

## Ploting

for (fnames in names(varratio))
{
  print(fnames)
  obsname <- str_remove(fnames,"La_")
  menv <- mean(body_fat_Decorrelated[,fnames])
  range <- max(body_fat[,obsname])-min(body_fat[,obsname])
  ylim <- c(menv-range/2,menv+range/2)
  xvals <- c(min(body_fat[,obsname]),max(body_fat[,obsname]))
  plot(body_fat[,obsname],
       body_fat_Decorrelated[,fnames],
       ylim=ylim,
       ylab=fnames,
       xlab=obsname,
       main=paste("ILAA Latent Variable:",fnames))

  lmtvals <- lm(body_fat_Decorrelated[,fnames]~body_fat[,obsname])
  pred <- lmtvals$coefficients[1] + lmtvals$coefficients[2] * xvals
  lines(x=xvals,y=pred,col="red")
  text(xvals[1]+(xvals[2]-xvals[1])/2,0.95*(ylim[2]-ylim[1])+ylim[1],sprintf("Slope= %.2f",lmtvals$coefficients[2]))

  
    
  deformula <- LatentFormulas[[fnames]]
  noInames <- names(deformula)[names(deformula) != obsname]
  predObs <- -(as.matrix(body_fat[,noInames]) %*% deformula[noInames])
  xvals <- c(min(predObs),max(predObs))
  plot(predObs,
       body_fat[,obsname],
       ylab=obsname,
       xlab=paste("Model:",obsname),
       main=paste("ILAA Generated Predictions of",obsname)
       )
  
  lmtvals <- lm(body_fat[,obsname]~predObs)
  pred <- lmtvals$coefficients[1] + lmtvals$coefficients[2] * xvals
  lines(x=xvals,y=pred,col="red")
  ylim <- c(min(body_fat[,obsname]),max(body_fat[,obsname]))

  text(xvals[1]+(xvals[2]-xvals[1])/2,0.95*(ylim[2]-ylim[1])+ylim[1],sprintf("Slope= %.2f",lmtvals$coefficients[2]))

}

[1] “La_Ankle” [1] “La_Forearm” [1] “La_Age” [1] “La_Wrist” [1] “La_Biceps” [1] “La_BodyFat” [1] “La_Neck” [1] “La_Knee” [1] “La_BMI” [1] “La_Thigh” [1] “La_Abdomen” [1] “La_Chest” [1] “La_Hip” [1] “La_Height”


par(op)

The visual inspection of the above-displayed figures shows that some latent variables are not associated with the original parent variable, but their model is fully correlated to the observed parent variable. A clear example is The last plot in the above figure.

3.9.3 Direct Transform Estimation

It is also possible to estimate the observation using the transformation matrix. For this just set the diagonal of the transformation to zero, the the output is the estimated observation from the independent variables. In this example I also estimated the bias.

transform <- attr(body_fat_Decorrelated,"UPLTM")
# Set the diagonal to zero
diag(transform) <- 0

#Estimating the observation
obsestim <- -1*as.data.frame(as.matrix(body_fat[,rownames(transform)]) %*% transform)

#Bias estimation
bias <- apply(body_fat[,rownames(transform)],2,mean) - apply(obsestim[,colnames(transform)],2,mean)

#Plotting
par(mfrow=c(1,2),cex=0.45)
for (vn in names(varratio))
{
  oname <- str_remove_all(vn,"La_")
  plot(obsestim[,vn] + bias[oname],body_fat[,oname],xlab=paste("Estimated:",oname),ylab=oname,main=oname)
  indx <- obsestim[,vn]+bias[oname]
  lmtvals <- lm(body_fat[,oname] ~ indx )
  xvals <- c(min(obsestim[,vn]+ bias[oname]),max(obsestim[,vn]+ bias[oname])) 
  pred <- lmtvals$coefficients[1] + lmtvals$coefficients[2] * xvals
  lines(x=xvals,y=pred,col="red")
  ylim <- c(min(body_fat[,oname]),max(body_fat[,oname]))

  text(xvals[1]+(xvals[2]-xvals[1])/2,0.95*(ylim[2]-ylim[1])+ylim[1],
       sprintf("Slope= %.2f, R2=%3.2f",lmtvals$coefficients[2],1.0-varratio[vn])
      )
}

par(op)

4 ILAA for Supervised Learning

The rerecorded use of ILAA transformation in supervised learning is to split the data into training and validation sets. Henceforth, the next lines of code will split the data into training (75%) and testing (25%)

4.1 Split into Training Testing Sets


# 75% for training 25% for testing 
set.seed(2)
trainsamples <- sample(nrow(body_fat),3*nrow(body_fat)/4)

trainingset <- body_fat[trainsamples,]
testingset <- body_fat[-trainsamples,]

4.2 Data Train Analysis and Prediction of the Test Set

By default, ILAA() transforms are blind to outcome associations. but in supervised learning the user is free to specify a target outcome to drive the shape of the transformation matrix. Outcome-driven transformations try to keep unaltered features strongly associated with the target.

The predictDecorrelate() function can be used to predict any new dataset from an ILAA transformed object.

The next code snippet shows the process of transforming the training set and then using the returned object to transform the testing set using both outcome-blind and outcome-driven transformations.


## Outcome-blind
body_fat_Decorrelated_train <- ILAA(trainingset,
                                    thr=0.2,
                                    Outcome="BodyFat",
                                    verbose=TRUE)

fast | LM | Weight Age Weight Height Neck Chest Abdomen 0.07142857 1.00000000 0.14285714 0.50000000 0.78571429 0.71428571

Included: 14 , Uni p: 0.01071429 , Base Size: 1 , Rcrit: 0.1676986

1 <R=0.940,thr=0.900>, Top: 2< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.880

2 <R=0.880,thr=0.800>, Top: 1< 4 >Fa= 2,<|><>Tot Used: 8 , Added: 4 , Zero Std: 0 , Max Cor: 0.852

3 <R=0.852,thr=0.800>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 9 , Added: 1 , Zero Std: 0 , Max Cor: 0.969

4 <R=0.969,thr=0.950>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 9 , Added: 1 , Zero Std: 0 , Max Cor: 0.779

5 <R=0.779,thr=0.700>, Top: 1< 2 >Fa= 2,<|><>Tot Used: 11 , Added: 2 , Zero Std: 0 , Max Cor: 0.694

6 <R=0.694,thr=0.600>, Top: 1< 2 >Fa= 2,<|><>Tot Used: 13 , Added: 2 , Zero Std: 0 , Max Cor: 0.462

7 <R=0.462,thr=0.400>, Top: 2< 1 >Fa= 3,<|><>Tot Used: 13 , Added: 2 , Zero Std: 0 , Max Cor: 0.394

8 <R=0.394,thr=0.300>, Top: 4< 1 >Fa= 6,<|><>Tot Used: 13 , Added: 4 , Zero Std: 0 , Max Cor: 0.374

9 <R=0.374,thr=0.300>, Top: 4< 1 >Fa= 8,<|><>Tot Used: 14 , Added: 4 , Zero Std: 0 , Max Cor: 0.351

10 <R=0.351,thr=0.300>, Top: 2< 1 >Fa= 8,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.360

11 <R=0.360,thr=0.300>, Top: 2< 1 >Fa= 9,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.355

12 <R=0.355,thr=0.300>, Top: 1< 1 >Fa= 9,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.294

13 <R=0.294,thr=0.200>, Top: 4< 1 >Fa= 10,<|><>Tot Used: 14 , Added: 4 , Zero Std: 0 , Max Cor: 0.304

14 <R=0.304,thr=0.300>, Top: 1< 1 >Fa= 10,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.254

15 <R=0.254,thr=0.200>, Top: 4< 1 >Fa= 10,<|><>Tot Used: 14 , Added: 3 , Zero Std: 0 , Max Cor: 0.248

16 <R=0.248,thr=0.200>, Top: 2< 1 >Fa= 10,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.243

17 <R=0.243,thr=0.200>, Top: 1< 1 >Fa= 10,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.198

18 <R=0.198,thr=0.200>

[ 18 ], 0.1979781 Decor Dimension: 14 Nused: 14 . Cor to Base: 13 , ABase: 14 , Outcome Base: 0

pander::pander(attr(body_fat_Decorrelated_train,"drivingFeatures"))

Weight, Hip, BMI, Chest, Abdomen, Thigh, Knee, Neck, Biceps, Wrist, Forearm, Ankle, Height and Age


body_fat_Decorrelated_test <- predictDecorrelate(body_fat_Decorrelated_train
                                                 ,testingset)

## Outcome-driven transformation
body_fat_Decorrelated_trainD <- ILAA(trainingset,
                                     thr=0.2,
                                     Outcome="BodyFat",
                                     drivingFeatures="BodyFat",
                                     verbose=TRUE)

fast | LM | Abdomen BMI Chest Hip Weight Thigh 2.703526e-46 1.013764e-33 4.927811e-28 1.286637e-24 6.362965e-22 1.567169e-18

Abdomen Age Weight Height Neck Chest Abdomen 0.14285714 0.71428571 0.07142857 0.50000000 0.85714286 1.00000000

Included: 14 , Uni p: 0.01071429 , Base Size: 1 , Rcrit: 0.1676986

1 <R=0.940,thr=0.900>, Top: 2< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.880

2 <R=0.880,thr=0.800>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 1 , Zero Std: 0 , Max Cor: 0.783

3 <R=0.783,thr=0.700>, Top: 1< 3 >Fa= 2,<|><>Tot Used: 8 , Added: 3 , Zero Std: 0 , Max Cor: 0.719

4 <R=0.719,thr=0.700>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.695

5 <R=0.695,thr=0.600>, Top: 2< 1 >Fa= 3,<|><>Tot Used: 11 , Added: 3 , Zero Std: 0 , Max Cor: 0.544

6 <R=0.544,thr=0.500>, Top: 3< 2 >Fa= 5,<|><>Tot Used: 12 , Added: 4 , Zero Std: 0 , Max Cor: 0.469

7 <R=0.469,thr=0.400>, Top: 5< 1 >Fa= 5,<|><>Tot Used: 14 , Added: 4 , Zero Std: 0 , Max Cor: 0.594

8 <R=0.594,thr=0.500>, Top: 2< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.623

9 <R=0.623,thr=0.600>, Top: 1< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.733

10 <R=0.733,thr=0.700>, Top: 1< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.580

11 <R=0.580,thr=0.500>, Top: 1< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.755

12 <R=0.755,thr=0.700>, Top: 1< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.458

13 <R=0.458,thr=0.400>, Top: 2< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.530

14 <R=0.530,thr=0.500>, Top: 1< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.420

15 <R=0.420,thr=0.400>, Top: 1< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.400

16 <R=0.400,thr=0.300>, Top: 5< 2 >Fa= 8,<|><>Tot Used: 14 , Added: 6 , Zero Std: 0 , Max Cor: 0.346

17 <R=0.346,thr=0.300>, Top: 2< 1 >Fa= 8,<|><>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.306

18 <R=0.306,thr=0.300>, Top: 2< 1 >Fa= 8,<|><>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.292

19 <R=0.292,thr=0.200>, Top: 3< 2 >Fa= 8,<|><>Tot Used: 14 , Added: 4 , Zero Std: 0 , Max Cor: 0.301

20 <R=0.301,thr=0.300>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.300

21 <R=0.300,thr=0.200>, Top: 3< 1 >Fa= 9,<|><>Tot Used: 14 , Added: 5 , Zero Std: 0 , Max Cor: 0.335

22 <R=0.335,thr=0.300>, Top: 1< 1 >Fa= 9,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.257

23 <R=0.257,thr=0.200>, Top: 2< 1 >Fa= 9,<|><>Tot Used: 14 , Added: 3 , Zero Std: 0 , Max Cor: 0.239

24 <R=0.239,thr=0.200>, Top: 1< 1 >Fa= 9,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.200

25 <R=0.200,thr=0.200>

[ 25 ], 0.1999494 Decor Dimension: 14 Nused: 14 . Cor to Base: 13 , ABase: 14 , Outcome Base: 13


pander::pander(attr(body_fat_Decorrelated_trainD,"drivingFeatures"))

Abdomen, BMI, Chest, Hip, Weight, Thigh, Knee, Neck, Biceps, Forearm, Wrist, Ankle, Age and Height


body_fat_Decorrelated_testD <- predictDecorrelate(body_fat_Decorrelated_trainD
                                                  ,testingset)

4.3 Train a Regression Model for Body Fat Prediction

Once we have a transformed training and testing set, we can proceed to train a linear model of the body fat content. For this example we will use the LASSO_1SE() function of the FRESA.CAD package to model the \(BodyFat\) using all the variables in the transformed training set.


## Outcome-Blind
modelBodyFat <- LASSO_1SE(BodyFat~.,body_fat_Decorrelated_train)
pander::pander(as.matrix(modelBodyFat$coef))
(Intercept) -3.592
Weight 0.150
La_Abdomen 0.718
La_BMI 1.515

## Outcome-Driven
modelBodyFatD <- LASSO_1SE(BodyFat~.,body_fat_Decorrelated_trainD)
pander::pander(as.matrix(modelBodyFatD$coef))
(Intercept) -38.4288
La_Weight -0.0489
La_Neck -0.2434
Abdomen 0.6053
La_Hip -0.0883

The printed beta coefficients of the models show that the LASSO models are different between the Outcome-driven and outcome-blind ILAA methods.

4.3.0.1 Muticollinear Analysis

Here we check the Variance inflation factor (VIF) on the train and testing sets

frm <- paste("BodyFat~",str_flatten(modelBodyFat$selectedfeatures," + "))

X <- model.matrix(formula(frm),body_fat_Decorrelated_train);
mc <- multiCol(X)
title("Train VIF")

vifd <- VIF(X)
vifx <-vif(lm(formula(frm),body_fat_Decorrelated_train))

X <- model.matrix(formula(frm),body_fat_Decorrelated_test);
mc <- multiCol(X)
title("Test VIF")


frm <- paste("BodyFat~",str_flatten(modelBodyFatD$selectedfeatures," + "))
X <- model.matrix(formula(frm),body_fat_Decorrelated_trainD);
mc <- multiCol(X)
title("Driven: Train VIF")


X <- model.matrix(formula(frm),body_fat_Decorrelated_testD);

mc <- multiCol(X)
title("Driven: Test VIF")

The plots clearly indicate that both models do not have colinearity issues

4.3.1 The Model Coefficients in the Observed Space

The FRESA.CAD package provides a handy function, getObservedCoef()m to get the linear beta coefficients from the transformed object. The next code shows the procedure.


# Get the coefficients in the observed space for the outcome-blind
observedCoef <- getObservedCoef(body_fat_Decorrelated_train,modelBodyFat)
pander::pander(as.matrix(observedCoef$coefficients))
(Intercept) -3.59177
Weight -0.00299
Chest -0.36091
Abdomen 0.71823
Hip -0.26029
BMI 0.75375



# The outcome-driven coefficients
observedCoefD <- getObservedCoef(body_fat_Decorrelated_trainD,modelBodyFatD)
pander::pander(as.matrix(observedCoefD$coefficients))
(Intercept) -38.4288
Weight -0.0489
Neck -0.1211
Abdomen 0.6835
Hip 0.0569
BMI 0.0789

4.3.1.1 Muticollinear Analysis on the observed space

Here we check the Variance inflation factor (VIF) on the train and testing sets using the observed variables

X <- model.matrix(formula(observedCoef$formula),trainingset);
mc <- multiCol(X)
title("Observed Training VIF")


X <- model.matrix(formula(observedCoef$formula),testingset);
mc <- multiCol(X)
title("Observed Testing VIF")


X <- model.matrix(formula(observedCoefD$formula),trainingset);
mc <- multiCol(X)
title("Driven: Observed Training VIF")


X <- model.matrix(formula(observedCoefD$formula),testingset);
mc <- multiCol(X)
title("Driven: Observed Testing VIF")

The results indicate that the models created using the observed variables have strong collinearity issues.

4.3.2 Predict Using the Transformed Data-Set

The user can predict the BodyFat content using the handy predict() function. After that we can measure the testing performance using the predictionStats_regression() function.


## OUtcome-Blind 
predicBodyFat <- predict(modelBodyFat,body_fat_Decorrelated_test)
rmetrics <- predictionStats_regression(cbind(testingset$BodyFat,
                                             predicBodyFat),
                                       "Body Fat: Blind")

Body Fat: Blind

pander::pander(rmetrics)
  • corci:

    cor    
    0.811 0.706 0.882
  • biasci: -0.0537, -1.2272 and 1.1199

  • RMSEci: 4.66, 3.97 and 5.64

  • spearmanci:

    50% 2.5% 97.5%
    0.818 0.699 0.895
  • MAEci:

    50% 2.5% 97.5%
    3.71 3.08 4.44
  • pearson:

    Pearson’s product-moment correlation: predictions[, 1] and predictions[, 2]
    Test statistic df P value Alternative hypothesis cor
    10.8 61 7.29e-16 * * * two.sided 0.811

## Outcome-Driven
predicBodyFatD <- predict(modelBodyFatD,body_fat_Decorrelated_testD)
rmetrics <- predictionStats_regression(cbind(testingset$BodyFat,
                                             predicBodyFatD),
                                       "Body Fat: Driven")

Body Fat: Driven

pander::pander(rmetrics)
  • corci:

    cor    
    0.821 0.72 0.888
  • biasci: 0.184, -0.967 and 1.335

  • RMSEci: 4.57, 3.90 and 5.54

  • spearmanci:

    50% 2.5% 97.5%
    0.832 0.716 0.904
  • MAEci:

    50% 2.5% 97.5%
    3.52 2.83 4.29
  • pearson:

    Pearson’s product-moment correlation: predictions[, 1] and predictions[, 2]
    Test statistic df P value Alternative hypothesis cor
    11.2 61 1.68e-16 * * * two.sided 0.821

The reported metrics indicated that the model predictions are highly correlated to the real \(BodyFat\)

4.3.3 Prediction Using the Observed Features

An ILAA user has the option to predict the \(BodyFat\) content from the observed testing set using the computed beta coefficients. The next lines of code show how to do the prediction using model.matrix() R function and the dot product %*% :



predicBodyFatObst <- model.matrix(formula(observedCoef$formula),testingset) %*% observedCoef$coefficients

plot(predicBodyFatObst,
     predicBodyFat,
     xlab="Observed Space",
     ylab="Transformed Space",
     main="Test Predictions: Observed vs. Transformed")

The last plot shows the expected result: that both predictions are identical.

4.3.4 Comparison to Raw Model

A last experiment is to compare the differences between a LASSO model created from the observed features to the model created from the transformed observations.

The next lines of code compute the linear model using LASSO from the original observed data. Then, it computes the predicted performance.

rawmodelBodyFat <- LASSO_1SE(BodyFat~.,trainingset)
pander::pander(rawmodelBodyFat$coef)
(Intercept) Height Abdomen
-23.2 -0.189 0.601

rawpredicBodyFat <- predict(rawmodelBodyFat,testingset)
rmetrics <- predictionStats_regression(cbind(testingset$BodyFat,
                                             rawpredicBodyFat),"Body Fat")

Body Fat

pander::pander(rmetrics)
  • corci:

    cor    
    0.808 0.701 0.88
  • biasci: 0.169, -1.020 and 1.358

  • RMSEci: 4.72, 4.02 and 5.72

  • spearmanci:

    50% 2.5% 97.5%
    0.813 0.687 0.894
  • MAEci:

    50% 2.5% 97.5%
    3.66 3 4.43
  • pearson:

    Pearson’s product-moment correlation: predictions[, 1] and predictions[, 2]
    Test statistic df P value Alternative hypothesis cor
    10.7 61 1.13e-15 * * * two.sided 0.808

The evaluation of the testing results indicates that the observed model predictions have a correlation of 0.875. Slightly superior, but not statistically significant, to the one observed from the model estimated from the transformed space: ( \(\rho _t=0.863\) vs. \(\rho _o=0.875\) )

4.3.5 Comparing the Feature Significance on the Models

The main advantage of the ILAA transformation is that the returned latent variables are not colinear hence the statistical significance of the beta coefficients are not affected by multicolinearity. The next code snippet shows how to get the beta coefficients using the lm() , and summary.lm() functions.

The inspection of the summary results clearly shows that most of the beta coefficients on the transformed data set are significant.


## Raw Model
par(mfrow=c(2,2),cex=0.5)
rawlm <- lm(BodyFat~.,
            trainingset[,c("BodyFat",names(rawmodelBodyFat$coef)[-1])])
pander::pander(rawlm,add.significance.stars=TRUE)
Fitting linear model: BodyFat ~ .
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.733 9.2028 -0.514 6.08e-01
Height -0.580 0.1324 -4.384 1.95e-05 * * *
Abdomen 0.699 0.0331 21.117 3.61e-51 * * *
plot(rawlm)


## Outcome-Blind
par(mfrow=c(2,2),cex=0.5)
Delm <- lm(BodyFat~.,body_fat_Decorrelated_train[,c("BodyFat",names(modelBodyFat$coef)[-1])])
pander::pander(Delm,add.significance.stars=TRUE)
Fitting linear model: BodyFat ~ .
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.941 3.2000 -2.48 1.40e-02 *
Weight 0.180 0.0122 14.79 3.95e-33 * * *
La_Abdomen 0.952 0.0956 9.96 5.91e-19 * * *
La_BMI 2.079 0.2065 10.07 2.88e-19 * * *
plot(Delm)


## Outcome-Driven
par(mfrow=c(2,2),cex=0.5)
Delm <- lm(BodyFat~.,
           body_fat_Decorrelated_trainD[,c("BodyFat",names(modelBodyFatD$coef)[-1])])
pander::pander(Delm,add.significance.stars=TRUE)
Fitting linear model: BodyFat ~ .
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -44.155 12.8683 -3.43 7.43e-04 * * *
La_Weight -0.131 0.0442 -2.96 3.49e-03 * *
La_Neck -0.655 0.2221 -2.95 3.59e-03 * *
Abdomen 0.671 0.0320 21.00 1.29e-50 * * *
La_Hip -0.297 0.1019 -2.92 3.98e-03 * *
plot(Delm)


par(op)

4.4 Train a Logistic Model for Overweight Prediction

This last experiment showcases the effect of data transformation on logistic modeling. This experiment starts by creating a data-frame that does not includes the \(BMI\), \(Height\), and \(Weight\) variables. The target outcome is to identify if the person is Overweight or normal. (BMI>=25). The next lines of code compute the new data frames and remove the above mentioned variables.

4.4.1 Data Conditioning

First Remove Height and Weight from Training and Testing Sets


trainingsetBMI <- trainingset[,!(colnames(trainingset) %in% c("Weight","Height"))]
testingsetBMI <- testingset[,!(colnames(trainingset) %in% c("Weight","Height"))]
trainingsetBMI$Overweight <- 1*(trainingsetBMI$BMI>=25)
testingsetBMI$Overweight <- 1*(testingsetBMI$BMI>=25)
trainingsetBMI$BMI <- NULL
testingsetBMI$BMI <- NULL

# The number of subjects
pander::pander(table(trainingsetBMI$Overweight))
0 1
96 92
pander::pander(table(testingsetBMI$Overweight))
0 1
29 34

## The outcome-blind transformation
OW_Decorrelated_train <- ILAA(trainingsetBMI,
                              thr=0.2,
                              Outcome="Overweight",
                              verbose=TRUE)

fast | LM | Chest BodyFat Age Neck Chest Abdomen Hip 0.58333333 0.08333333 0.50000000 1.00000000 0.91666667 0.83333333

Included: 12 , Uni p: 0.0125 , Base Size: 1 , Rcrit: 0.1634602

1 <R=0.917,thr=0.900>, Top: 1< 1 >Fa= 1,<|><>Tot Used: 2 , Added: 1 , Zero Std: 0 , Max Cor: 0.880

2 <R=0.880,thr=0.800>, Top: 1< 1 >Fa= 1,<|><>Tot Used: 3 , Added: 1 , Zero Std: 0 , Max Cor: 0.783

3 <R=0.783,thr=0.700>, Top: 1< 4 >Fa= 1,<|><>Tot Used: 7 , Added: 4 , Zero Std: 0 , Max Cor: 0.706

4 <R=0.706,thr=0.700>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 7 , Added: 1 , Zero Std: 0 , Max Cor: 0.697

5 <R=0.697,thr=0.600>, Top: 1< 3 >Fa= 2,<|><>Tot Used: 10 , Added: 3 , Zero Std: 0 , Max Cor: 0.640

6 <R=0.640,thr=0.600>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.567

7 <R=0.567,thr=0.500>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.497

8 <R=0.497,thr=0.400>, Top: 4< 1 >Fa= 5,<|><>Tot Used: 11 , Added: 4 , Zero Std: 0 , Max Cor: 0.425

9 <R=0.425,thr=0.400>, Top: 1< 1 >Fa= 5,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.381

10 <R=0.381,thr=0.300>, Top: 4< 1 >Fa= 6,<|><>Tot Used: 12 , Added: 3 , Zero Std: 0 , Max Cor: 0.382

11 <R=0.382,thr=0.300>, Top: 2< 1 >Fa= 7,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.295

12 <R=0.295,thr=0.200>, Top: 4< 2 >Fa= 7,<|><>Tot Used: 12 , Added: 5 , Zero Std: 0 , Max Cor: 0.299

13 <R=0.299,thr=0.200>, Top: 2< 1 >Fa= 7,<|><>Tot Used: 12 , Added: 2 , Zero Std: 0 , Max Cor: 0.257

14 <R=0.257,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.203

15 <R=0.203,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.189

16 <R=0.189,thr=0.200>

[ 16 ], 0.1894651 Decor Dimension: 12 Nused: 12 . Cor to Base: 11 , ABase: 12 , Outcome Base: 0


OW_Decorrelated_test <- predictDecorrelate(OW_Decorrelated_train,testingsetBMI)

## The outcome-driven transformation

OW_Decorrelated_trainD <- ILAA(trainingsetBMI,
                               thr=0.2,
                               Outcome="Overweight",
                               drivingFeatures="Overweight",
                               verbose=TRUE)

fast | LM | Chest Abdomen Hip Neck Thigh Biceps 6.940373e-28 9.802035e-28 5.772864e-23 9.353353e-19 1.810521e-18 1.810521e-18

Chest BodyFat Age Neck Chest Abdomen Hip 0.50000000 0.08333333 0.75000000 1.00000000 0.91666667 0.83333333

Included: 12 , Uni p: 0.0125 , Base Size: 1 , Rcrit: 0.1634602

1 <R=0.917,thr=0.900>, Top: 1< 1 >Fa= 1,<|><>Tot Used: 2 , Added: 1 , Zero Std: 0 , Max Cor: 0.880

2 <R=0.880,thr=0.800>, Top: 1< 1 >Fa= 1,<|><>Tot Used: 3 , Added: 1 , Zero Std: 0 , Max Cor: 0.783

3 <R=0.783,thr=0.700>, Top: 1< 4 >Fa= 1,<|><>Tot Used: 7 , Added: 4 , Zero Std: 0 , Max Cor: 0.706

4 <R=0.706,thr=0.700>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 7 , Added: 1 , Zero Std: 0 , Max Cor: 0.697

5 <R=0.697,thr=0.600>, Top: 1< 3 >Fa= 2,<|><>Tot Used: 10 , Added: 3 , Zero Std: 0 , Max Cor: 0.640

6 <R=0.640,thr=0.600>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.567

7 <R=0.567,thr=0.500>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.497

8 <R=0.497,thr=0.400>, Top: 4< 1 >Fa= 5,<|><>Tot Used: 11 , Added: 4 , Zero Std: 0 , Max Cor: 0.425

9 <R=0.425,thr=0.400>, Top: 1< 1 >Fa= 5,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.381

10 <R=0.381,thr=0.300>, Top: 4< 1 >Fa= 6,<|><>Tot Used: 12 , Added: 3 , Zero Std: 0 , Max Cor: 0.382

11 <R=0.382,thr=0.300>, Top: 2< 1 >Fa= 7,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.295

12 <R=0.295,thr=0.200>, Top: 4< 2 >Fa= 7,<|><>Tot Used: 12 , Added: 5 , Zero Std: 0 , Max Cor: 0.299

13 <R=0.299,thr=0.200>, Top: 2< 1 >Fa= 7,<|><>Tot Used: 12 , Added: 2 , Zero Std: 0 , Max Cor: 0.257

14 <R=0.257,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.203

15 <R=0.203,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.189

16 <R=0.189,thr=0.200>

[ 16 ], 0.1894651 Decor Dimension: 12 Nused: 12 . Cor to Base: 11 , ABase: 12 , Outcome Base: 12


OW_Decorrelated_testD <- predictDecorrelate(OW_Decorrelated_trainD,testingsetBMI)

The last code snippet transforms the observed features using ILLA and setting a target variable and setting the convergence not to be affected by the target outcome.

4.4.2 The Logistic Model

LASSO_1SE with a binomial family is used to compute the logistic model of overweight.


## Outcome-blind
modelOverweight <- LASSO_1SE(Overweight~.,
                             OW_Decorrelated_train,
                             family="binomial")
pander::pander(as.matrix(modelOverweight$coef))
(Intercept) -36.9601
Chest 0.3877
La_Abdomen 0.0717
La_Hip 0.0351

## Outcome-driven
modelOverweightD <- LASSO_1SE(Overweight~.,
                              OW_Decorrelated_trainD,
                              family="binomial")
pander::pander(as.matrix(modelOverweightD$coef))
(Intercept) -40.3810
Chest 0.4225
La_Abdomen 0.0886
La_Hip 0.0561

4.4.3 The Model Coefficients in the Observed Space

Once the logistic model is created in the transformed space, we can compute the beta coefficients for each one of the observed variables.


# Get the coefficients in the observed space
observedCoef <- getObservedCoef(OW_Decorrelated_train,modelOverweight)
pander::pander(as.matrix(observedCoef$coefficients))
(Intercept) -36.9601
Chest 0.3070
Abdomen 0.0717
Hip -0.0031

4.4.4 Predict Using the Transformed Data Set

The predictions of the testing set can be done using the handy predict() function. The evaluation of the testing results can be evaluated using the predictionStats_binary() function.


## Outcome-blind
predicOverweight <- predict(modelOverweight,OW_Decorrelated_test)
pr <- predictionStats_binary(cbind(OW_Decorrelated_test$Overweight,
                                   predicOverweight),"Overweight: Blind")

pander::pander(pr$ClassMetrics)
  • accci:

    50% 2.5% 97.5%
    0.841 0.746 0.921
  • senci:

    50% 2.5% 97.5%
    0.841 0.75 0.921
  • aucci:

    50% 2.5% 97.5%
    0.841 0.75 0.921
  • berci:

    50% 2.5% 97.5%
    0.159 0.0788 0.25
  • preci:

    50% 2.5% 97.5%
    0.839 0.744 0.921
  • F1ci:

    50% 2.5% 97.5%
    0.838 0.744 0.92

## Outcome-Driven
predicOverweightD <- predict(modelOverweightD,OW_Decorrelated_testD)
pr <- predictionStats_binary(cbind(OW_Decorrelated_test$Overweight,
                                   predicOverweightD),"Overweight: Driven")

pander::pander(pr$ClassMetrics)
  • accci:

    50% 2.5% 97.5%
    0.841 0.746 0.921
  • senci:

    50% 2.5% 97.5%
    0.843 0.75 0.927
  • aucci:

    50% 2.5% 97.5%
    0.843 0.75 0.927
  • berci:

    50% 2.5% 97.5%
    0.157 0.0732 0.25
  • preci:

    50% 2.5% 97.5%
    0.842 0.747 0.923
  • F1ci:

    50% 2.5% 97.5%
    0.841 0.744 0.921

4.4.5 Prediction Using the Observed Features

The predict of the testing set can be done using the model.matrix() and the dot product %*%.


predicOverweightObst <- model.matrix(formula(observedCoef$formula),testingsetBMI) %*% observedCoef$coefficients
#predicOverweightObst <- 1.0/(1.0 + exp(-predicOverweightObst));

plot(predicOverweightObst,predicOverweight,
     xlab="Observed",
     ylab="Transformed",
     main="Test predictions: Observed vs. Transformed")

The last plot shows the expected result: both predictions are identical.

4.4.6 Comparison to Raw Model

To showcase the advantage of transformed modeling vs. raw modeling, here I’ll estimate the logistic model from the observed variables and contrast to the model generated from the transformed space.

The next lines of code compute the logistic model and display its testing performance:

##Training
rawmodelOverweight <- LASSO_1SE(Overweight~.,
                                trainingsetBMI,
                                family="binomial")
pander::pander(rawmodelOverweight$coef)
(Intercept) BodyFat Chest Abdomen Thigh Ankle Biceps
-39.9 0.0108 0.206 0.147 0.0275 0.064 0.0818
## Predict
rawpredicOverweight <- predict(rawmodelOverweight,testingsetBMI)
pr <- predictionStats_binary(cbind(testingsetBMI$Overweight,
                                   rawpredicOverweight),"Overweight")

pander::pander(pr$ClassMetrics)
  • accci:

    50% 2.5% 97.5%
    0.873 0.778 0.952
  • senci:

    50% 2.5% 97.5%
    0.873 0.779 0.948
  • aucci:

    50% 2.5% 97.5%
    0.873 0.779 0.948
  • berci:

    50% 2.5% 97.5%
    0.127 0.052 0.221
  • preci:

    50% 2.5% 97.5%
    0.877 0.788 0.952
  • F1ci:

    50% 2.5% 97.5%
    0.872 0.777 0.95

The model created from the observed data has an ROC AUC that is not statistically significant to the transformed model

4.4.7 Comparing the Feature Significance on the Models

This last lines of code will compute the significance of the beta coefficients for both the observed model and the latent-based model. The user can clearly see that all the betas of the latent-based model are statically significant. An effect that is not seen in the logistic observed model.


par(mfrow=c(2,2),cex=0.5)

## Raw model
rawlm <- lm(Overweight~.,trainingsetBMI[,c("Overweight",names(rawmodelOverweight$coef)[-1])])
pander::pander(rawlm,add.significance.stars=TRUE)
Fitting linear model: Overweight ~ .
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.96182 0.40972 -9.669 4.30e-18 * * *
BodyFat 0.00588 0.00492 1.194 2.34e-01
Chest 0.01521 0.00781 1.947 5.31e-02
Abdomen 0.01486 0.00743 2.000 4.70e-02 *
Thigh -0.00120 0.00807 -0.149 8.82e-01
Ankle 0.02388 0.01761 1.356 1.77e-01
Biceps 0.03003 0.01229 2.444 1.55e-02 *
plot(rawlm)


## Outcome-blind
par(mfrow=c(2,2),cex=0.5)
Delm <- lm(Overweight~.,OW_Decorrelated_test[,c("Overweight",names(modelOverweight$coef)[-1])])
pander::pander(Delm,add.significance.stars=TRUE)
Fitting linear model: Overweight ~ .
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.16781 0.83956 -2.582 1.23e-02 *
Chest 0.03524 0.00548 6.434 2.44e-08 * * *
La_Abdomen 0.01911 0.01274 1.500 1.39e-01
La_Hip -0.00373 0.00997 -0.374 7.10e-01
plot(Delm)


## Outcome-Driven
par(mfrow=c(2,2),cex=0.5)
Delm <- lm(Overweight~.,OW_Decorrelated_testD[,c("Overweight",names(modelOverweightD$coef)[-1])])
pander::pander(Delm,add.significance.stars=TRUE)
Fitting linear model: Overweight ~ .
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.16781 0.83956 -2.582 1.23e-02 *
Chest 0.03524 0.00548 6.434 2.44e-08 * * *
La_Abdomen 0.01911 0.01274 1.500 1.39e-01
La_Hip -0.00373 0.00997 -0.374 7.10e-01
plot(Delm)

5 Conclusion

In conclusion, ILAA (Iterative Linear Association Analysis), stands as an unsupervised computer-based methodology adept at estimating linear transformation matrices. These matrices enable the conversion of datasets into a fresh latent-based space, offering a user-controlled degree of correlation. This report has effectively demonstrated the practical application of ILAA, providing comprehensive insights into its functions for estimating, predicting, and scrutinizing transformations. Such capabilities hold significant promise in supervised learning scenarios, encompassing regression and logistic models.